منابع مشابه
Parallel LU Factorization on GPU Cluster
This paper describes our progress in developing software for performing parallel LU factorization of a large dense matrix on a GPU cluster. Three approaches, with increasing software complexity, are considered: (i) a naive “thunking” approach that links the existing parallel ScaLAPACK software library with cuBLAS through a software emulation layer; (ii) a more intrusive magmaBLAS implementation...
متن کاملLu Factorization on Parallel Computers
Abstract-A new parallel algorithm for the LU factorization of a given dense matrix A is described. The case of banded matrices is also considered. This algorithm can be combined with Sameh and Brent’s [SIAM J. Numer. Anal. 14, 1101-I 113. (1977)] to obtain the solution of a linear system of algebraic equations. The arithmetic complexity for the dense case is in’ ($bn in the banded case), using ...
متن کاملParallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU
In this technical report we study different parallel graph coloring algorithms and their application to the incomplete-LU factorization. We implement graph coloring based on different heuristics and showcase their performance on the GPU. We also present a comprehensive comparison of level-scheduling and graph coloring approaches for the incomplete-LU factorization and triangular solve. We discu...
متن کاملParallel Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU
A novel algorithm for computing the incomplete-LU and Cholesky factorization with 0 fill-in on a graphics processing unit (GPU) is proposed. It implements the incomplete factorization of the given matrix in two phases. First, the symbolic analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the numerical factorization...
متن کاملFine-Grained Parallel Incomplete LU Factorization
This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros in the incomplete factors can be computed in parallel and asynchronously, using one or more sweeps that iteratively improve the accuracy of the factorization. Unlike existing parallel algorithms, the new algorithm does not depend on reordering the matrix. Numerical tests show tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2012
ISSN: 1877-0509
DOI: 10.1016/j.procs.2012.04.008